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Abstract 



^— N I The study of the sub-structure of complex networks is of major importance to relate topology 

p^ . and functionality. Many efforts have been devoted to the analysis of the modular structure of 

networks using the quality function known as modularity. However, generally speaking, the re- 

-h' i lation between topological modules and functional groups is still unknown, and depends on the 

j^ ' semantic of the links. Sometimes, we know in advance that many connections are transitive and. 



as a consequence, triangles have a specific meaning. Here we propose the study of the modular 
structure of networks considering triangles as the building blocks of modules. The method gener- 
alizes the standard modularity and uses spectral optimization to find its maximum. We compare 
the partitions obtained with those resulting from the optimization of the standard modularity in 
several real networks. The results show that the information reported by the analysis of modules 
of triangles complements the information of the classical modularity analysis. 
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1. Introduction 
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\^ , The study of the modular (or community) structure of complex networks has become a chal- 

CO ' lenging subject 111] with potential applications in many disciplines, ranging from sociology to 

computer science, see reviews 12|,|3l|4|]. Understanding the modular units of graphs of interac- 
f^ ' tions (links) between nodes, representing people and their acquaintances, documents and their 

^^ , citation relations, computers and their physical or logical connections, etc., is of utmost impor- 

tance to grasping knowledge about the functionality and performance of such systems. One of 
the most successful approaches to identify the underlying modular structure of complex net- 
ij ■ works, has been the introduction of the quality function called modularity Islla]. Modularity 

r\ , encompasses two goals: (i) it implicitly defines modules as those subgraphs that optimize this 

C^ ' quantity, and (ii) it provides a quantitative measure to find them via optimization algorithms. It 

is based on the intuitive idea that random networks are not expected to exhibit modular structure 
(communities) beyond fluctuations ||7|]. 
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A lot of effort has been put into proposing reliable techniques to maximize modularity |8i 
l9L ll0llllLll2lll3[ll4lll5[ll6.] . see review 1.17.1 . To a large extent, the success of modularity as 
a quahty function to analyze the modular structure of complex networks relies on its intrinsic 
simplicity. The researcher interested in this analysis is endowed with a non-parametric function 
to be optimized: modularity. The result of the analysis will provide a partition of the network 
into communities such that the number of edges within each community is larger than the number 
of edges one would expect to find by random chance. As a consequence, each community is a 
subset of nodes more connected between them than with the rest of the nodes in the network. 
The user has to be aware of some aspects about resolution limitations that avoid grasping the 



modular structure of networks at low scales using modularity BlSll . The problem can be solved 
using multiresolution methods 1. 19, ,201. 

The mathematical formulation of modularity was proposed for unweighed and undirected 
networks [5] and generalized later to weighted [6] and directed networks pi]. The generalized 
definition is as follows 

(1) 



1=1 j=\ V / 



where w,j is the strength of the link between the nodes / and j of the network, w"'" = Y^j ^ij is 
the strength of links going from /, w'" - 2i ^ij is the strength of links coming to j, and the total 
strength of the network is 2w - Yjij Wy- Finally, C, is the index of the community to which node 
/ belongs to, and 6(x, y) is the Kronecker function assigning 1 only if x - y, and otherwise. 

A close look to Eq.([T]i reveals that the building block of the community structure we are 
looking for, within this formulation, is the link between two nodes. Every term in Eq.([TJ accounts 
for the difference, within a module, between the actual existence of a link with weight w,j and 
the probability of existence of such a link just by chance, preserving the strength distribution. 

However, in many cases the minimal and functional structural entity of a graph is not a simple 
link but a small structure (motif) of several nodes [22] . Motifs are small subgraphs that can be 
found in a network and that correspond to a specific functional pattern of that network. Statisti- 
cal over-representation of motifs (compared with the random occurrence of these sub-structures) 
has been a useful technique to determine minimum building blocks of functionality in complex 
networks, and several works exploit their identification 122. 23. 241 . Among the possible motifs, 
the simplest one is the triangle which represents the basic unit of transitivity and redundancy in a 
graph, see Figure 1 . This motif is over-represented in many real networks, for example motifs 12 
and 13 in Figure 1, the feedback with two mutual dyads and the fully connected triad respec- 
tively, are characteristic motifs of the WWW. Motif 7 (feed-forward loop) is over-represented in 
electronic circuits, neurons connectivity and gene regulatory transcription networks. The reason 
for this over-representation relies on the functionality of such small subgraphs on the evolution 
and performance of the specific network. In the WWW as well as in social networks, the fully 
connected triad is probably the result of the transitivity of contents or human relations, respec- 
tively. The feed-forward loop is related to the reliability or fail tolerance of the connections 
between important elements involved in communication chains. The idea we propose here is that 
finding modules containing such motifs as building blocks could improve our information about 
the modular structure of complex networks. The importance of transitivity is traced back to the 
seminal paper ll25n where it is proposed the clustering coefficient, a scalar measure quantifying 
the total number of triangles in a network through the average likelihood that two neighbors of a 
vertex are neighbors themselves. 
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Figure 1: List of all possible three-nodes motifs. 

The main goal of our work is to determine communities using as building blocks triangular 
motifs. We propose an approach for triangle community detection based on modularity opti- 
mization using the spectral algorithm decomposition and optimization. The resulting algorithm 
is able to identify efficiently the best partition in communities of triangles of any given network, 
optimizing their correspondent modularity function. 

2. Spectral decomposition for triangle community detection 

Let G - (V,A) be a weighted undirected graph representing a complex network, where V 
represents the vertices set and A the edges set. The objective is to identify communities of 
triangles, i.e. a partition with the requirement that the density of triangles formed by any three 
nodes /, j and k inside the same module is larger than the triangles formed outside the module. 
We will define this objective using a proper adaptation of modularity. 

2.1. Triangle modularity tensor 

In ll26ll some of us introduced a mathematical formalism to cope with modularity of motifs of 
any size. Capitalizing on this work, here we study the specificity of triangle modularity 2a (C) of 
a certain partition C of an undirected graph (the extension to directed graphs is straightforward, 
although a little bit more intricate, we present this extension in the Appendix). The mathematical 
definition is 

2^('^) = Z Z Z BijkSiCi, Cj)6(Cj, Ck)6(Ck, C) , (2) 

'■ J k 

where C, is the index of the community which node / belongs to, and B,jt 

Bijk = T^WijWjkWki - -Z^iWiWjXWjWkKWkWi) (3) 

is a three indices mathematical object (triangle modularity tensor, from now on) that evaluates 
for each triad /, j, k, the diff'erence between the actual density of strength of the triangle in the 
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graph and the expected density of this triangle in a random configuration with the same strength 
distribution (null case). The normalization constant Tg is the total number of triads of nodes 
forming triangles in the network, 

^'^^mm ^U^JkWki , (4) 

'■ ; k 

and its counterpart T^ for the null case term is 

'■ J k 

It is straightforward to check that the triangle modularity tensor satisfies: 

Bijk - Bjki - Bkij , (6) 

I j k 

2.2. Spectral optimization of triangle modularity 

The computation of the triangle modularity is demanding due to the combinatorial number 
of triads that can be formed. The proposal of any optimization algorithm for this function must 
be aware of this cost. Among the possibilities already stated in the literature we devise that the 
spectral optimization scheme, first proposed in 1.16.1 . is a candidate to perform this task efficiently. 
The idea behind this algorithm is to use the eigenspectrum of the modularity matrix, which plays 
a role in community detection similar to that played by the graph Laplacian, and use a recursion 
splitting reminiscent of graph partitioning calculations. The problem we have is that a direct 
mapping to the usual spectral modularity optimization is not straightforward given the structure 
of Eq.©. Basically we need to transform Eq.© in a function with the following structure: 

QiC) ^YiY^ s-MijSj , (8) 

'■ J 

where the leading eigenvector of M,y, the modularity matrix, will induce the first recursion step, 
splitting the network in two parts. 

We propose the following transformation: let us assume a partition of the network in two 
communities, introducing the variables s,, which are +1 or -1 depending on the community to 
which node / belongs to, and taking into account that 

6(Ci,Cj)^fl+SiSj), (9) 



then 



6(Ci, Cj)6iCj, Ck)6(C,, C) = i(l + sisj)(l + sjs,)(l + s^sd 



-(1 + SiSj + SjSk + StSi) , (10) 



where we have made use of s^ = +1. Therefore, using Eqs. (|6]l and (|7]l, 



Qa(S) = 4 Zj Zj Zj ^'J''^^ "*" '^'■'^•'' "•" ^J^'' "•" *'^*'^ 

'■ j k 



(11) 



Defining the triangle modularity matrix 



Mi, 



- 2 Bijk 

k 

= — w,7 ^ WjviWfa' - — (w,w,)(WjW;) ^(wkWk) ■ 



(12) 



then 



2-(^^ = lZZ'^'^'V^^- 



(13) 



Thus, we have been able to reduce the optimization of the triangle modularity into the standard 
spectral algorithm given in 1.16.1 . 

For the case of undirected networks, this matrix is symmetric and the computation of its 
eigenspectra gives real values. However, if the network is directed, this property is not necessar- 
ily true, and then a symmetrization of the matrix is needed before computing its spectrum (see 
Appendix). 

Once a first division of the network in two parts has been obtained, it is possible to iterate the 
process, while modularity improves, by a recursive application of the spectral splitting to each 
subgraph. To this end, we need the value of the triangle modularity matrix for any subgraph. 
Supposing we have a subgraph g to be divided into g\ and g2, the change in triangle modularity 
is given by 



^Q^ig ^ 8l,g2) 



Zj B'Jk + Zj ^'■'* ~ Zj ^'■'* 
i,j,kegi iJMgi iJMg 



^ aTj Tj'^'J''''J-Tj^'' 



keg 



V'je? 



i,jeg 



3 v-i 



(14) 



'J<^g 



where 



Mijig) = X 

keg 



( \ 

BiJk - Sij Zj B'lk 
f^g 



(15) 



and Si is +1 for nodes in gi and -1 for nodes in g2- Therefore, the new triangle modularity matrix 
is not just a submatrix of the original one, but additional terms appear to take into account the 
connectivity with the rest of the network. 



Algorithm 1 Triangle community detection 



Require: Connected network G(V,E) 

Ensure: Triangle communities C, Triangle modularity of the partition Q^(C) 

1 : Read network 

2: Current subgraph g <— G 

3: Build modularity matrix M(g) 

4: Compute gA(g) 

5: Compute leading eigenvalue and eigenvector of M(g) 

6: Decomposition of group g in two groups: ^1 and g2, using the signs of eigenvector compo- 
nents 



Compute the modularity 2a(g 1 , g2) of the initial split of group g 
Improve Q^(gl,g2) using KL optimization between gl and g2 
Compute the modularity Q&igl, g2) of the split of group g 
iiQ^{g\,g2)>Q^(g)ihtn 

goto 3 with ^ <— gl 

goto 3 with g <^ g2 
end if 



2.3. Algorithm 

Once the triangle modularity has been transformed to the proper form to be optimized by 
spectral decomposition, we can proceed to formulate a complete decomposition-optimization 
algorithm. After the first analysis of the eigenspectra, the eigenvector associated to the largest 
eigenvalue is used to determine the elements that will be assigned to one of the two communties 
according to the sign of their eigenvector component, this process is recursively executed until 
no new splits are obtained. The decomposition given by the spectral partitioning can be improved 
by a fine-tuning of the nodes asignments after the process ends. 

We use the Kernighan-Lin optimization method to improve the modularity as explained in 
Olq] , The main idea is to move vertices in a group to another increasing the modularity. We move 
all vertices exactly once. At each step, we choose to move the vertex giving the best improvement 
(largest increase in the modularity). When all vertices are moved, we repeat the process until no 
improvement is possible. Some computational issues should be considered here: the computation 
of the largest eigenvalue and its corresponding eigenvector can be efficiently determined using 
the iterative Lanczos method ||27[1 : the computation of Qt.{S) is, in principle, of order O(N^), 
however it can be done very efficiently by pre-computing and storing the values of T^ and Tc, and 
the lists of triangles to which each node belongs to; finally, the KL post-processing stage which 
is eventually the computational bottleneck of the process, must be parameterized according to 
the number of nodes we pretend to move and the relative improvement of modularity observed. 

3. Results 

In this section we show the results of the algorithm, applied to several real networks. We 
have used the following networks: 

• Football uJ], a network of American football games between Division lA colleges during 
regular season Fall 2000. 



Network 


Nodes 


Links 


e 


Ga 


A(e, ej 


Football 


115 


613 


0.604 


0.924 


0.529 


Zachary 


34 


78 


0.419 


0.706 


0.685 


Dolphins 


62 


159 


0.528 


0.817 


0.547 


Adjnoun 


112 


425 


0.308 


0.299 


-0.029 


Elec s208 


122 


189 


0.686 


0.998 


0.454 


Neurons 


279 


2287 


0.405 


0.433 


0.069 


Cortex 


55 


564 


0.372 


0.708 


0.903 



Table 1: Comparison of standard and triangle modularities. 

Zachary 02811 . a social network of friendships between 34 members of a karate club at a 
US university in the 1970s. 



Dolphins 11291] . an undirected social network of frequent associations between 62 dolphins 
in a community living off Doubtful Sound, New Zealand. 



Adjnoun 113011 . adjacency network of common adjectives and nouns in the novel David 
Copperfield by Charles Dickens. 

Elec s208 li22il . benchmark of sequential logic electronic circuit. 



Neurons 113 111 , network of neural connectivity of the nematode C.elegans. 



Cortex !l32ll . network of connections between cortical areas in the cat brain. 



To evaluate the information provided by the new triangle modularity, we perform a comparison 
with the standard modularity EqlT] We have developed a comparison in both the values of the 
optimal modularity, and the partitions obtained. 

3.1. Modularities comparison 

Table[T]shows the best standard, and triangle modularities found using spectral optimization. 
We define a new parameter A(2, 2a) = (2a - Q)IQ that measures the relative difference between 
both. Positive values of A(2, g^) indicate that the contribution of triangles to communities is 
larger than standard modularity communities, and the contrary for negative values. 

From Table[T|we observe that in Adjnoun, which is almost a bipartite network, the standard 
modularity is larger than the triangle modularity, in accordance with the absence of these motifs. 
On the other side, for the Zachary network, a human social network where transitivity is implicit 
in many acquaintances, the triangle modularity becomes more informative than the standard 
modularity. Indeed, the optimal standard modularity proposes a decomposition of this network 
in four groups, while the optimal triangle modularity is achieved for a partition in two groups 
plus two isolated nodes (nodes 10 and 12) that do not participate in any triangle. Moreover 
the partition in two groups is in accordance with the observed split of this network after a fight 
between the administrator and the instructor of the club, see Figure|2l 



(a) Triangle modularity 



(b) Standard modularity 




Figure 2: Zachary network partitions. Best pailitions found by optimization of (a) triangle modularity and (b) standard 
modularity. The real splitting of the network is represented by the shape of the symbols (squares and circles). Colors 
indicate the assignment of nodes to the modules found. 



3.2. Communities comparison 

A deeper comparison consist in to analyze the different modules obtained using the standard 
and triangle modularity. To this end, we need some measures to analyze the difference in the 
assignments of nodes to modules, taking into account that we will also have different modu- 
lar partitions. Here, we use two measures, the Normalized Mutual Information (NMI) and the 
Asymmetric Wallace Index (AW). 

In 1,33.1 the authors define the NMI to compare two clusterings. The idea is the following: let 
be a clustering A with ca communities and a clustering B with cb communities, and let us define 
the confusion matrix A^ whose rows correspond to the communities of the first clustering (A) and 
columns correspond to the communities of second clustering (B). The elements of the confusion 
matrix. Nap, represent the number of common nodes between community a of the clutering A 
and community fi of the clustering B, the partial sums Na, - Y^/s Nap and N.p - 2„ Nap are the 
sizes of these communities, and A^ = Eo E/j Nap is the total number of nodes. The measure NMI 
between two clusterings A and B is 



-2Y,Yj^ap\0g 



NMI(A, B) : 



a=\ p=\ 



Na.Np 



pM^ypM^) 



(16) 



If the partitions are identical, then NMI takes its maximum value of 1 . If the partitions are 
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Networks 


NMI 


AWi 


AW2 


Football 


0.8903 


0.8488 


0.6901 


Zachary 


0.6380 


0.7945 


0.5524 


Dolphins 


0.6663 


0.4810 


0.7838 


Adjnoun 


0.4888 


0.3136 


0.3845 


Elec s208 


0.6098 


0.0307 


0.9091 


Neurons 


0.6045 


0.7276 


0.6954 


Cortex 


0.8361 


0.6841 


1.0000 



Table 2: Comparison of partitions obtained using standard and triangles modularities. The different measures are ex- 
plained in the text. 



totally independent, NMI = 0. It measures the amount of information that both partitions have 
in common. 



The Asymmetric Wallace Index 113411 is the probability that a pair of elements in one cluster of 
partition A (resp. B) is also in the same cluster of partition B (resp. A). Using the same definitions 
as for the NMI, the two possible Asymmetric Wallace Indices are: 



ff=l B=I 

AWi(A,B) = — ^ , (17) 



=i/i=i 

■a Cb 

= 1/^=1 



AW2(A, B) = ^^ . (18) 



The asymmetric Wallace index shows the inclusion of a partition in the other 

In Table |2] we observe that the largest NMI is for the communities of football network. That 
means that the standard and triangle communities found in that network are very similar Indeed, 
the structure of the football network is very dense and almost all nodes participate in triangles. 
For the the AW2 of the cortex network is equal to 1, that means that all the triangle communities 
are included in the standard ones. 



4. Conclusions 

We have designed an algorithm to compute the communities of triangular motifs using an 
spectral decomposition of the triangle modularity matrix. The algorithm provides partitions 
where transitive relations are the building blocks of their internal structure. The results of these 
partitions are complementary to those obtained maximizing the classical modularity, that ac- 
counts only for individual links, and can be used to improve our knowledge of the mesoscopic 
structure of complex networks. 



5. Appendix 

Here we show the computation of the triangle modularity matrix for a directed motif, in 
particular motif 7 in Figure 1, although as will be shown the process is equivalent for any other 
motif configuration. In this case, we have 

G'^e^) " Z Z Z '^Uk^(Ci^ Cj)6(Cj, Ct)6(Ck, C) , (19) 

i j k 

where Bijk is 

Bijk - ^Wijwjtwu - ^(wrwp(w™vn(wro . 

Jc In 

The normalization constant Tg are now 



(20) 



^^ " Z Z Z ^iJ^Jk^ki , (21) 

'■ ; k 

and 

Tn^YjYj Y^<'<)^<'K)i<<) ■ (22) 

'■ ./ k 

Using the transformation proposed in Eq. ( fTOl l 
Mu - Yj'^uk 

k 

= -^ ^u Z ^^■^^*' - ^«"'v^;")(«) y(«) ■ (23) 

Tc ^ Tn ' ' V 

then 

' j 

Owing to the fact that the graph is directed, the modularity matrix M,y may be not symmetric, 
which causes technical problems. However, it is possible to restore the symmetry thanks to the 
scalar nature of Qa(S) 13511 . A symmetrization of the triangle modularity matrix M, 

, 1 r 

M = -(M + M^) , (25) 

yields 

= iZZ'^'<-^" ^26) 

' ./ 

recovering the necessary symmetry to apply the standard spectral optimization. 

In the same manner, we can define the modularity matrix for all possible motifs of Figure [T| 
just by modifying Bijk- For example, for motif 13 in Figure[T]we have: 
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1 

Bijk = T^^WijW jiW jtWkjWkiWik 



' .,out\2/,.,in\2/,,,out\2/,.,in\2/,.,out\2/,,,in\2 



- ^(H'r)'(wp^(w°'")^«)^(w^)^(>v;")^ (2?) 

Tg = 2_i Zj Zj ^'■''^^''^^'*^*^'^*''^* ' ^^^-^ 
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